The mega-matrix tree of life: using genome-scale horizontal gene transfer and sequence evolution data as information about the vertical history of life
نویسندگان
چکیده
Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323 404 translated open reading frames arranged into 12 381 gene homologue groups coded as amino acid data and 63 336, 64 105, 65 153, 66 922 and 67 109 gene homologue groups coded as gene presence ⁄absence data for 166 fully sequenced genomes. This whole-genome gene presence ⁄absence and amino acid sequence ToL data matrix is composed of 4867 184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination. The Willi Hennig Society 2010.
منابع مشابه
Evolution of viruses and cells: do we need a fourth domain of life to explain the origin of eukaryotes?
The recent discovery of diverse very large viruses, such as the mimivirus, has fostered a profusion of hypotheses positing that these viruses define a new domain of life together with the three cellular ones (Archaea, Bacteria and Eucarya). It has also been speculated that they have played a key role in the origin of eukaryotes as donors of important genes or even as the structures at the origi...
متن کاملPhylo SI: a new genome-wide approach for prokaryotic phylogeny
The evolutionary history of all life forms is usually represented as a vertical tree-like process. In prokaryotes, however, the vertical signal is partly obscured by the massive influence of horizontal gene transfer (HGT). The HGT creates widespread discordance between evolutionary histories of different genes as genomes become mosaics of gene histories. Thus, the Tree of Life (TOL) has been qu...
متن کاملI-49: Human Y Chromosome ProteomeProject
The success of the Human Genome Project (HGP) has provided a blueprint for the approximately 20,000 gene-encoded proteins potentially active in all of the hundreds of cell types that make up the human body. Yet we still have limited knowledge about a majority of the gene-encoded proteins which are the “building blocks of life” and “cellular machinery”. It is estimated that for nearly half of th...
متن کاملReconstructing evolutionary relationships from functional data: a consistent classification of organisms based on translation inhibition response.
The last two decades have witnessed an unsurpassed effort aimed at reconstructing the history of life from the genetic information contained in extant organisms. The availability of many sequenced genomes has allowed the reconstruction of phylogenies from gene families and its comparison with traditional single-gene trees. However, the appearance of major discrepancies between both approaches q...
متن کاملEfficient inference of bacterial strain trees from genome-scale multilocus data
MOTIVATION In bacterial evolution, inferring a strain tree, which is the evolutionary history of different strains of the same bacterium, plays a major role in analyzing and understanding the evolution of strongly isolated populations, population divergence and various evolutionary events, such as horizontal gene transfer and homologous recombination. Inferring a strain tree from multilocus dat...
متن کامل